Comparing the Performance of CMCC-BioClimInd and WorldClim Datasets in Predicting Global Invasive Plant Distributions

Simple Summary The species distribution model has been widely used to predict the distribution of invasive plant species based on bioclimatic variables. However, the specific selection of bioclimate variables may affect the performance of the species distribution model. Here, we tested a new bioclimate variable dataset (i.e., CMCC BioClimInd) and used it in the species distribution model. We evaluated the predictive performance and explanatory power of WorldClim and CMCC-BioClimInd using AUC and omission rate, and also used the ODMAP protocol to record CMCC-BioClimInd to ensure reproducibility. The results indicate that CMCC BioClimInd can effectively simulate the distribution of invasive plant species. Based on the contribution rate of CMCC-BioClimInd to the distribution of invasive plant species, we inferred that the modified simplified continentality index and modified Kira warmth index from CMCC-BioClimInd had a strong explanatory power. Under the 35 bioclimatic variables of CMCC-BioClimInd, alien invasive plant species are mainly distributed in equatorial, tropical and subtropical regions. We tested a new bioclimate variable dataset to simulate the distribution of invasive plant species worldwide. Our research provides a new perspective for risk assessment and management of global invasive plant species. Abstract Species distribution modeling (SDM) has been widely used to predict the distribution of invasive plant species based on bioclimatic variables. However, the specific selection of these variables may affect the performance of SDM. This investigation elucidates a new bioclimate variable dataset (i.e., CMCC-BioClimInd) for its use in SDM. The predictive performance of SDM that includes WorldClim and CMCC-BioClimInd was evaluated by AUC and omission rate and the explanatory power of both datasets was assessed by the jackknife method. Furthermore, the ODMAP protocol was used to record CMCC-BioClimInd to ensure reproducibility. The results indicated that CMCC-BioClimInd effectively simulates invasive plant species’ distribution. Based on the contribution rate of CMCC-BioClimInd to the distribution of invasive plant species, it was inferred that the modified and simplified continentality and Kira warmth index from CMCC-BioClimInd had a strong explanatory power. Under the 35 bioclimatic variables of CMCC-BioClimInd, alien invasive plant species are mainly distributed in equatorial, tropical, and subtropical regions. We tested a new bioclimate variable dataset to simulate the distribution of invasive plant species worldwide. This method has great potential to improve the efficiency of species distribution modeling, thereby providing a new perspective for risk assessment and management of global invasive plant species.


Introduction
Invasive plant species (IPS) are a global problem affecting agriculture, forestry, fisheries, human health, and natural ecosystems [1][2][3]. Climate change affects the identified IPS niche, thereby affecting their regional and global distributions [4,5]. The global average that CMCC-BioClimInd is accurate, comprehensive, and effective for predicting invasive species distributions and simulating climate change [46]. To our knowledge, no one has used CMCC-BioClimInd to predict invasive plant distributions nor applied it in practice. WorldClim is the most used dataset to predict species and potential invasive plant species distributions, which is greatly innovative (e.g., [2,15,32,49]).
Although species distribution models are widely used, the reproducibility of SDM methods is often limited due to the lack of reporting standards and the uncertainty of their predictions [7,30,31]. Therefore, here the ODMAP scheme was used to enhance the rationality and repeatability of this research [50,51]. The ODMAP (overview, data, model, evaluation, and prediction) reporting protocol provides a standardized way to communicate SDM results and outputs by describing objectives, model assumptions, scaling issues, data sources, model workflows, model predictions, and uncertainties [50,51]. The ODMAP protocol has two main purposes. First, it provides a checklist detailing the key steps of model construction and analysis to authors. Second, it introduces a standard documentation method to ensure transparency and repeatability [50,51]. Here, we tested the CMCC-BioClimInd dataset, described its basic elements, and detailed metadata based on the ODMAP (provided in Table S1).
We screened 11 most representative species of the 100 most dangerous alien invasive species in the world. We introduced a new global dataset of bioclimatic indicators to predict the distribution of invasive species and compared it with WorldClim (most used) to verify the prediction quality of the CMCC-BioClimInd data set and its results. This study specifically aimed to (a) compare the prediction performance of invasive species distributions by comparing WorldClim and CMCC-BioClimInd AUC values; (b) identify the most effective CMCC-BioClimInd variables affecting the distribution of invasive species; (c) determine the potential IPS distributions based on CMCC-BioClimInd; and (d) evaluate bioclimatic variables causing differences in IPS distributions for the same first 19 variables of WorldClim and CMCC-BioClimInd.

Occurrence Data
According to the expert group on invasive species, the world's most invasive nonnative species were compiled, among which the most important 11 IPS [52] were Ligustrum robustum, Cinchona pubescens, Morella faya, Miconia calvescens, Cecropia peltate, Spathodea campanulata, Melaleuca quinquenervia, Schinus terebinthifolia, Acacia mearnsii, Leucaena leucocephala, and Pinus pinaster. The occurrence records of these species were downloaded from the Global Biodiversity Figure 1). We downloaded species distribution data from 1970 to 1999 on GBIF, because only between 1970 and 1999 did the time periods of the two datasets coincide.The data we downloaded were processed as follows: (1) Carefully checked and screened the inaccurate or heteronymous species; (2) Deleted records with the same longitude and latitude; (3) Deleted duplicate records in a specific spatial resolution area [2,51]. Finally, a total of 390,000 geographical coordinate points of these 11 species were included in our analysis [53,54].

Data of Bioclimatic Variables
The origin of CMCC-BioClimInd is based on the daily time series of temperature and precipitation available in the weather data set of Water and European Medium-Range Weather Forecast Centre (ECMWF) reanalysis (ERA-40) [55], which is described in detail for the historical period (http://www.eu-watch.org/data_availability, accessed on 4 April 2022). The CMCC-BioClimInd data set is from (https://doi.org/10.1594/PANGAEA. 904278, accessed on 5 April 2022; [46]). We obtained a set of 35 bioclimatic variables with a spatial resolution of 0.5 • × 0.5 • (1960-1999), covering the entire world (excluding Antarctica) [46]. WorldClim downloaded from (https://www.worldclim.org, accessed on 6 April 2022). A set of 5 arc minutes (10 × 10 km 2 ) spatial resolution for 19 bioclimatic variables (1970-2000) [29,35,36] required the authors to resample the WorldClim climate variable to 0.5 • resolution in order to be consistent with the resolution of CMCC-BioClimInd climate variable. The details of bioclimatic variables were shown in Table S2.

Data of Bioclimatic Variables
The origin of CMCC-BioClimInd is based on the daily time series of temperature precipitation available in the weather data set of Water and European Medium-Ra Weather Forecast Centre (ECMWF) reanalysis (ERA-40) [55], which is described in d for the historical period (http://www.eu-watch.org/data_availability, accessed on 4 A 2022). The CMCC-BioClimInd data set is from (https://doi.org/10.1594/PA GAEA.904278, accessed on 5 April 2022; [46]). We obtained a set of 35 bioclimatic varia We ran the Maxent model four times, namely, CMCC-BioClimInd, CMCC-BioClimInd (bio1-bio19), CMCC-BioClimInd (bio20-bio35), and WorldClim. By running it four times, we clearly compared CMCC-BioClimInd and WorldClim performance and verified the invasive species distribution prediction accuracy. Running CMCC-BioClimInd (bio1-bio19) compared the climate variable difference on invasive species distribution to WorldClim. Running CMCC-BioClimInd (bio20-bio35) revealed the impact of the new variables on invasive species distribution in addition to the first 19 variables, and more fully revealed the effectiveness of the new variables.

Modelling Approach and Evaluation
Based on species occurrence data and relevant environmental variables, the Maxent model is used to model species distribution under climate change [56,57]. Here, we established a logistic regression model with data from the 11 IPS distributions as response variables, and by running the Maxent model four times using the climate variables in the four climate data sets, namely, WorldClim and CMCC-BioClimInd (bio1-bio35), CMCC-BioClimInd (bio1-bio19), and CMCC-BioClimInd (bio20-bio35). The IPS distributional data were divided into a random training test set (auctest, 75%) and a test model set (auctrain, 25%). The regularization multiplier was set to two and the number of replicates to four [56,58,59].
We use the area under the curve (AUC) of the receiver's operating characteristics to evaluate the prediction accuracy of the species distribution model. The AUC takes each value of the prediction result as a possible threshold, and then obtains the corresponding sensitivity and specificity values to calculate the curve [58]. The greater the AUC value, the greater the deviation between species distribution and random distribution (i.e., AUC = 0.5; [57,59]). The greater the correlation between variables and models, the higher the accuracy of the models. An AUC > 0.7 indicates that the model is effective [57]. We have added the omission rate test metric. The omission rate refers to the proportion of evaluation areas that are not within the scope of the model once converted to binary prediction [60,61]. The omission rate provides information about discrimination and overfitting evaluated under specific thresholds. Generally speaking, the lower the omission rate, the higher the performance [60,61]. We evaluate the performance of the model through the AUC and omission, which has a certain degree of scientific accuracy.

Effects of Bioclimatic Variables on Global Invasive Plant Species Distributions
Firstly, we used a jackknife method to assess bioclimatic variable contribution to the species' distribution probability. The jackknife method output format showed the bioclimatic variables of each data set to the distribution probability, with values ranging from 0 (representing the smallest contribution) to 100% (representing the largest contribution) [61,62]. Secondly, we used an independent sample t-test [62] to compare the contribution rates of 19 bioclimatic variables in WorldClim (bio1-bio19) and CMCC-BioClimInd (bio1-bio19). We evaluated the difference of the average contribution rate of the first 19 bioclimatic variables to the distributions of IPS between the two models. Finally, after running Maxent, we generated ASCII files for both models (CCMC-BioClimInd and World-Clim). In GIS, we used mathematical analysis to subtract the WorldClim invasive plant distribution probability map from the CCMC-BioClimInd map [46]. We then obtained the distribution difference map for the 11 invasive plants. Positive values indicated that the predicted CMCC-BioClimInd distribution probability in a specific area was higher than WorldClim [46]. The opposite was true for negative values [46].
We used the complete CMCC-BioClimInd (bio1-bio35) to analyze the average contribution to the 11 invasive species. Surprisingly, the average contribution rate of 16 new variables (bio20-bio35) for invasive species distribution reached 56.732%, while the original 19 variables only reached 43.268% (Table 2). Here, the modified Kira warmth index (bio26), simplified continentality index (bio27), and bio4 all contributed more than 10% to invasive species distribution, including bio26 (13.883%), bio27 (12.322%), and bio4 (10.774%) ( Table 2). To ensure the accuracy of these 16 variables for predicting the invasive species distribution and to make the results more intuitive, we ran CMCC-BioClimInd (bio20-bio35) separately. The results were consistent with the bioclimatic variable contribution rate of the complete CMCC-BioClimInd set. Moreover, bio26 and bio27 contributed markedly more to invasive species distribution ( Table 2).

Distribution Probability of Invasive Species
The predicted species' distribution ranges were roughly similar as assessed by running the distribution probability map of the Maxent model four times with different climate variables. They were also concentrated in the same region. However, they were not completely consistent ( Figure 2). For these 11 species, the CMCC-BioClimInd (bio1-bio19) distribution ranges were significantly larger than WorldClim (bio1-bio19) ( Figure 2). Moreover, the predicted climate variable distribution ranges of WorldClim were larger than those of CMCC-BioClimInd; however, the differences were small (Figures 1 and 3). However, CMCC-BioClimInd and CMCC-BioClimInd (bio20-bio35) similarly predicted the invasive plant distributions and distribution probabilities ( Figure 2). The complete CMCC-BioClimInd set was considered the criterion since it was more comprehensive and accurate for invasive species prediction. Acacia mearnsii is distributed mainly in western and eastern South America, southern Australia, eastern Africa, and the western Mediterranean (Figure 2). Cecropia peltata is primarily located near the equator, especially in northern South America, with a high distribution probability (Figure 2). Cinchona pubescens is distributed mainly in western South America and central Africa, and the IPS main distribution range is also near the tropics (Figure 2). Leucaena leucocephala is widely distributed in the southern hemisphere, specifically in central Africa, Maldives, northern and southern South America, southern Asia, northeastern and western Australia (Figure 2). Ligustrum robustum is mainly distributed in southeast China, Southeast Asia, and eastern India. Melaleuca quinquenervia is distributed mainly near the Tropic of Capricorn and Cancer, with a small distribution range primarily concentric in the Maldives and Brazil (Figure 2). Miconia calvescens is distributed mainly between the equator and the Tropic of Cancer in Brazil, Peru, and central Africa (Figure 2). Morella faya showed a relatively scattered small distribution range (Figure 2) and only a few countries in the world are affected by it, being primarily distributed in southwest Spain, the Azores, Madeira, and the Canary Islands ( Figure 2). Pinus pinaster is mainly distributed in the Mediterranean basin, southern Australia, and northeastern New Zealand, with a small distribution range (Figure 3). Schinus terebinthifolia is distributed near the Tropic of Cancer and the Tropic of Capricorn, mainly in eastern Brazil and eastern Africa (Figure 2). Spathodea campanulata is widely distributed and concentrated between the Tropic of Cancer and Capricorn, mainly in central Africa, north-central South America, southern Asia, and southern North America (Figure 2).

Differences in the Distribution Probability of Invasive Plant Species Predicted by the WorldClim and CMCC-BioClimInd Datasets
The differences in invasive plant distributions predicted by the two models for the 11 species were concentrated in the main distribution sites. Globally (except Antarctica), the invasive distribution probability maps predicted by the two models were roughly similar, and the area of difference was relatively small. The differences were concentrated in the Himalayas, Malaysia, and the Mediterranean (Figure 3). The species distribution probability predicted by CMCC-BioClimInd was greater than WorldClim in the Himalayas for Leucaena leucocephala, Ligustrum robustum, Melaleuca quinquenervia, and Spathodea campanulate ( Figure 3). The distribution probability in Malaysia predicted by CMCC-BioClimInd was also higher than that of the WorldClim model for Acacia mearnsii, Leucaena leucocephala, Melaleuca quinquenervia, Miconia calvescens, Morella faya, Schinus terebinthifolia, and Spathodea campanulata (Figure 3). However, for Acacia mearnsii, Morella faya, and Pinus pinaster, WorldClim was higher than CMCC-BioClimInd when predicting the potential Mediterranean basin distribution (Figure 3).

Differences in the Distribution Probability of Invasive Plant Species Predicted by the WorldClim and CMCC-BioClimInd Datasets
The differences in invasive plant distributions predicted by the two models for the 11 species were concentrated in the main distribution sites. Globally (except Antarctica), the invasive distribution probability maps predicted by the two models were roughly similar, and the area of difference was relatively small. The differences were concentrated in the Himalayas, Malaysia, and the Mediterranean (Figure 3). The species distribution probability predicted by CMCC-BioClimInd was greater than WorldClim in the Himalayas for Leucaena leucocephala, Ligustrum robustum, Melaleuca quinquenervia, and Spathodea campanulate (Figure 3). The distribution probability in Malaysia predicted by CMCC-BioClimInd was also higher than that of the WorldClim model for Acacia mearnsii, Leucaena leucocephala, Melaleuca quinquenervia, Miconia calvescens, Morella faya, Schinus terebinthifolia, and Spathodea campanulata (Figure 3). However, for Acacia mearnsii, Morella faya, and Pinus pinaster, WorldClim was higher than CMCC-BioClimInd when predicting the potential Mediterranean basin distribution (Figure 3).

Discussion
This investigation introduced a novel global bioclimatic index dataset [46] to predict the distribution of 11 invasive species. The 35 climate variables of CMCC-BioClimInd were divided into three datasets, and WorldClim was added to run the model. By comparing the AUC and omission rates of the four models, it revealed that the average AUC of all four models was higher than 0.94%. The omission rate in CMCC-BioClimInd was less than 0.2, while the omission rate in WorldClim was less than 0.25. Therefore, CMCC-BioClimInd was very effective and accurate in predicting the distribution probability for IPS (Table 1).
WorldClim's climate dataset was derived by interpolating station data, while CMCC-BioClimInd's was derived from climate reanalysis and 11 CMIP5 climate simulations [45,63]. The comparison revealed that the following variables contributed greatly to the invasive species distributions: bio4, bio11, and bio3 from WorldClim, and bio4 and bio1 from CMCC-BioClimInd (bio1-bio19) ( Table 2). The literature indicated an increased correlation between bio3 and bio11, and the contribution rates of these two variables to invasive species distribution in WorldClim were very high [50,64] and inappropriate. Among the climate variables of WorldClim and CMCC-BioClimInd (bio1-bio19), the contribution rates of bio1, bio10, and bio17 to invasive species distribution were quite different (Table S3). Notably, the differences mentioned above were mainly in areas where the variable estimates were less accurate due to the paucity of ground observations [65], and some artifacts may have arisen from the interpolation function used to create the spatial gridded dataset [46]. The possible explanation for the observed differences between WorldClim and CMCC-BioClimInd, might be the different weights given to observations concerning climate modeling data when creating the datasets [46].
CMCC-BioClimInd has 35 environmental variables [46,66]; it was noted that the first 19 CMCC-BioClimInd climate variables were similar to WorldClim, and had 16 additional climate variables compared to WorldClim [46]. Our results showed that among the 35 environmental variables, bio26, bio27, and bio4 had a higher contribution to invasive species distribution (Table 2). Furthermore, when the CMCC-BioClimInd (bio20-bio35) variables were compared with CMCC-BioClimInd (bio1-bio19) variables, it was found that the latter 16 variables dominated the invasive species distribution probabilities, with a 57% contribution rate, while the first 19 variables only contributed about 43% (Table 2). In CMCC-BioClimInd, bio27 and bio26 contributed significantly to invasive species distributions (Table 2), inferring the importance of the last 16 variables to invasive species distributions.
Although temperature and precipitation had different effects on invasive species distributions, inconsistent with previous studies [2,3], it was found that bio27 and bio26 had the most obvious effect on the distribution of invasive species (Table 2). Apart from how solar radiation varies with latitude, research has revealed that continentality is the most important factor controlling locality variation in Earth's climate and affecting plant growth [67]. Generally, various Earth's surface factors influence radiation fluxes, heat, and moisture at the air-land and air-water interfaces; these affect weather aspects such as temperature, precipitation, and cloudiness [67][68][69]. Additionally, previous studies have proved that the plant growth of the forest community is affected by the thermal climate to some extent; for example, there is a tendency for increased aboveground plant height, plant biomass, and the degree of forest canopy multi-layering toward warmer regions [70]. On the other hand, the diversity of component plant flora is extremely sensitive to changes in the thermal climate [70][71][72]. Plant growth requires more than just precipitation and temperature [73]. For example, studies have shown that although some areas receive sufficient precipitation and light, many factors, such as a high evaporation rate, serious water loss, and radiation, may result in poor species growth [43,49].
We consider that CMCC-BioClimInd can predict invasive plant distributions more comprehensively and accurately [46,73]. Other studies revealed that environmental variables were inconsistent for predicting species distributions [74,75]. The contribution rates of various species were different, associated with species habitat and to a certain extent, the mutual restriction between organisms [73]. For example, Dingle et al. (2000) found that annual rainfall and soil moisture explained 90% and 62% of migratory butterfly species richness in the Australian drought center, respectively [76]. However, these two factors were not significant for butterflies in the rainy areas of eastern Australia [76]. In eastern Australia, temperature seasonality has become the best single climate predictor of butterfly species richness [76].
It was also observed that the 11 IPS mainly distributed in tropical rainforests and grasslands, subtropical evergreen broad-leaved forests, and the Mediterranean region ( Figure 2). These climate regions have abundant species resources, sufficient rain, heat, and forest resources [77]. Biodiversity is also richer in places with abundant plant resources [77]. For example, the invasive Morella faya is highly scattered throughout a narrow distribution area [78]. The introduction of fruit-eating birds promoted the spread of this fast-growing plant, and these plants quickly formed dense stands, endangering local plant growth [78].
Temperature and precipitation are essential for IPS biology [2][3][4][5]. On a large spatial scale, tolerance of invasive plants is usually linked with climate and the main habitat [6,75]. With climate change, invasive species from adjacent areas may cross national borders and become new biota elements [2,3]. Invasive species threaten plant growth in local habitats by competing with local vegetation, replacing grassland communities, reducing local biodiversity, and increasing water loss in riparian zones [4,13]. Therefore, CMCC-BioClimInd can predict potential alterations for invasive species with respect to future climate according to the 35 climate variables and help to implement timely preventive measures [46]. Otherwise, invasive plants' economic loss and negative impacts on food security, biodiversity, and ecosystem services may soon sharply increase [1,77].
We found a significant difference in the response of IPS distribution probabilities between CMCC-BioClimInd and WorldClim near the Himalayas. According to the literature, in the Himalayas, the temperature and precipitation in CMCC-BioClimInd were higher than in WorldClim [46]. Therefore, the invasive plant distribution probabilities based on CMCC-BioClimInd were also higher in the Himalayas. Notably, a large distribution probability difference exists near the equator, Mediterranean, Malaysia, and western and eastern Australia (Figure 3). The effect of climate data's quality may also affect species distribution modelling [49]. To predict a reasonable future distribution, the indispensable bioclimatic variables used in a species distribution model must be reliable [49]. We have the following conjectures about the differences caused by the two data sets on the distribution of invasive species: (1) It may be because the CMCC-BioClimInd data set has 16 additional climate variables with greater impact compared to WorldClim for the distribution of invasive species [35,46]; (2) For bioclimatic variables, the sources of these variables are different [35,36,46]. Coarse-scale bioclimatic information may be insufficient or inconsistent for species distribution models derived from finer-scale species occurrence data [49]; (3) This may be because the distribution area for the main invasive species was in the region where the variable estimates were less accurate due to the paucity of ground observations. Furthermore, the observed differences between WorldClim and CMCC-BioClimInd might be because of the different observation's weights used with respect to climate modeling data when creating the datasets [35,46]. To limit IPS damage to global biodiversity, safety, and the economy, effective measures must be taken to prevent their further expansion [2,4,5]. Our research also provides a new global IPS risk assessment and management perspective.
Our research also has some limitations. (1) To some extent, the multiple linearities between variables in the CMCC-BioClimInd and WorldClim dataset were not addressed. Research has shown that correlation analysis refers to comparing and analyzing two or more related variables to measure the degree of correlation between variables [79]. There are various data indicators, and most overlap with each other, resulting in large redundancy in the data. Using a wide range of similar data cannot yield comprehensive information, resulting in a phenomenon where the amount of data and information is not proportional and potentially serious fitting [80,81]. In the selection and determination of MaxEnt environmental variables, the number of environmental variables can be changed due to their different abilities to determine species distributions [82]. The number of environmental variables largely affects MaxEnt's ability to simulate the distribution of invasive plants by altering the model's complexity [82]. Therefore, in future research, the multicollinearity problem between variables should be addressed and environmental variables that are the most important for studying species distribution should be selected. (2) In this study, the number of replicates set in the species distribution model was too low. For example, recent research on species distribution modelling has shown that MaxEnt often produces better classification results when users choose the optimal parameters [83].
Research results indicate that model parameterization significantly impacts the prediction accuracy of MaxEnt; therefore, appropriate parameterization is highly correlated with good classification results [84]. Thus, in future research, the number of iterations for the Maxent model will be set to 10-100. (3) Using AUC and commission rates to determine MaxEnt's modeling performance does not provide the best model predictions. With imbalanced datasets, AUC may be misleading as the number of positive and negative samples is uneven. In this case, AUC may have overestimated the classifier's performance. In addition, the score of AUC ignores the actual probability value, making it insensitive to changes in the predicted probability of maintaining its ranking, and the testing performance of ROC in spatial regions is rarely successful [85,86]. Additional performance evaluation indicators (e.g., TSS, Kappa, the null model for significance testing, Boyce index, Still block cross-validation) should be included in future studies [87][88][89][90]. (4) Although MaxEnt is widely used to simulate plant invasion [91], research has shown that the number of species recorded, the number of environmental variables, and the spatial scale all affect the performance of the MaxEnt distribution model, indicating that these three inputs can lead to uncertainty in the invasive plant MaxEnt [80]. In future research, we can use models such as Maxlike and general linear models to evaluate invasive plants.

Conclusions
The CMCC-BioClimInd datasets improve existing global bioclimatic datasets used for SDM. This is a pragmatic compromise that addresses some of the limitations of the currently available products and is accurate for predicting invasive species distributions. In the rapidly changing global environment, bioclimatic species modelling has become an important tool for answering many conservation biology and invasion ecology questions. The CMCC-BioClimInd dataset can provide a wide range of core functions for these models. After combining 35 CMCC-BioClimInd climates, bio27 and bio26 were found to greatly impact invasive species distributions. Furthermore, it was revealed that the invasive species were mainly distributed in areas with sufficient rain and heat, such as tropical rainforests and grasslands, and subtropical evergreen broad-leaved forests. Therefore, policymakers must reinforce the management of areas vulnerable to the six kinds of invasion and formulate effective strategies to prevent invasive plant expansion.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/biology12050652/s1, Table S1: Description of the CMCC-BioClimInd dataset according to the ODMAP protocol; Table S2: Climate variables in CMCC-Bioclimlnd dataset including codes, full names and unIPS. bio1~bio19 were the same as WorldClim dataset; Table S3: The significance of differences in the average contribution rate (%) of bioclimatic variables to species distribution probability between WorldClim and CMCC-Bioclimlnd based on independent-sample t test.
Author Contributions: J.W. and C.Z. conceived the ideas and designed the study; F.Z. and J.W. managed the data; C.W. and J.W. conducted the fieldwork; C.W., F.Z. and J.W. collaborated with the statistical analysis and interpretation of data; F.Z. wrote the first version of the manuscript with substantial contribution from C.W., C.Z. and J.W. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the Project of Qinghai Science and Technology Department (grant number 2020-ZJ-733).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.